Dirty-paper Coding without Channel Information 
at the Transmitter and Imperfect 
Estimation at the Receiver 

Pablo Piantanida and Pierre Duhamel 
Laboratoire des Signaux et Systemes, CNRS/Supelec, 
F-91192 Gif-sur-Yvette, France 
Email:{piantanida,pierre.duhamel}@lss. supelec.fr 



Abstract — In this paper, we examine the effects of imperfect 
channel estimation at the receiver and no channel knowledge at 
the transmitter on the capacity of the fading Costa's channel with 
channel state information non-causally known at the transmitter. 
We derive the optimal Dirty-paper coding (DPC) scheme and its 
corresponding achievable rates with the assumption of Gaussian 
inputs. Our results, for uncorrelated Rayleigh fading, provide 
intuitive insights on the impact of the channel estimate and 
the channel characteristics (e.g. SNR, fading process, channel 
training) on the achievable rates. These are useful in practical 
scenarios of multiuser wireless communications (e.g. Broadcast 
Channels) and information embedding applications (e.g. robust 
watermarking). We also studied optimal training design adapted 
to each application. We provide numerical results for a single- 
user fading Costa's channel with maximum-likehood (ML) chan- 
nel estimation. These illustrate an interesting practical trade-off 
between the amount of training and its impact to the interference 
cancellation performance using DPC scheme. 

I. Introduction 

Consider the problem of communicating over a Gaussian 
channel corrupted by an additive Gaussian interfering signal 
that is non-causally known at the transmitter This variation 
of the conventional additive white Gaussian noise (AWGN) 
channel is commonly known as channel with state information 
at the transmitter. The state S" is a random Gaussian variable 
with power Q and independent of the Gaussian noise Z. The 
channel input is the message m e {1, . . . , [2"^J } and its 
output isY = X + S + Z, where R is the rate in bit per 
transmission. The capacity expression of single-user channels 
with random parameters has been derived by Gel'fand and 
Pinsker in [1]. The authors show that the capacity of such a 
channel {W{y\x,s),x £ X, s e §} with state information S 
non-causally available at the transmitter is 

C= sup {liU;Y)~IiU;S)}, (1) 

U is an auxiliary random variable chosen so that U -e- (X, 5") ^ 
Y form a Markov Chain and x\s) = 6{x — f{u, s)')p{u\s). 



In "Writing on Dirty Paper" [2], Costa applied this result to 
an AWGN channel corrupted by an additive white Gaussian 
interfering signal S. He showed that choosing U ^ X + aS, 



with an appropriate value for a (a* 



P/{P 



being the AWGN variance). This coding scheme, referred as 
Dirty-paper coding (DPC), allows one to achieve the same 
capacity as if the interfering signal 5* was not present, i.e. 



C 



This result has gained considerable 



attention during the last years, mainly because of its potential 
use in communication scenarios where interference cancel- 
lation at the transmitter is needed. In particular, multiuser 
interference cancellation for Broadcast Channels (BC) and 
information embedding (digital watermarking for multime- 
dia security applications) are instances of such scenarios. 
In the recent years, the Gaussian Multiple-lnput-Multiple- 
Output Broadcast Channel (MIMO-BC) has been extensively 
studied. In [3], the authors based on DPC have established 
an achievable rate region, referred to as Dirty-paper coding 
region. Recently in [4], the DPC region was proved to be 
equal to the capacity. 

Most of the literature focuses on the information-theoretic 
performances of DPC under the assumption on the availability 
of perfect channel information at both transmitter and receiver 
However, it is well-known that the performances of wireless 
systems are severely affected if only a noisy estimate that 
differs from the true channel is available (cf. [5], [6] and [7]). 
Of particular interest is the issue of the effect of this imperfect 
channel knowledge if interference cancellation or Dirty-paper 
coding is used. The problem may even be more serious in the 
practical situations where no channel information is available 
at the transmitter, i.e., no feedback information from the 
receiver back to the transmitter with the channel estimates. 

Throughout this paper, we consider a wireless or water- 
marked channel modeled as y = H{X + S) + Z, where 



H is the random channel, which neither the transmitter nor 
the receiver know. We assume that the receiver estimates H 
during a phase of independent training, by using maximum- 
HkeHhood (ML) channel estimation (Section III). Whereas, the 
transmitter does not know this estimate. Then, we observe that 
depending on the targeted application, e.g. Broadcast Channel 
or robust watermarking, two different training scenarios are 
relevant. In this work, we determine the tradeoff between the 
amount of training required for channel estimation and the 
corresponding achievable rates using DPC (Section IV). We 
address this problem through the notion of reUable communi- 
cation based on the average of the error probability over all 
channel estimation errors. This allows to make an equivalence 
with the capacity of a composite (more noisy) channel. Our 
proposed framework is sufficiently general to involve the most 
important information embedding and multiuser communi- 
cation scenarios. Finally, Section V uses a Rayleigh-fading 
Costa's channel to illustrate average rates over all estimates, 
for different amount of training. 

II. Channel model 

First consider a general model for communication un- 
der channel uncertainty over discrete memoryless channels 
(DMCs) with input alphabet J', output alphabet ^ and 
channel states 5^ (cf. [1] and [8]). A specific instance of the 
unknown channel is characterized by a transition probability 
mass (PM) W{-\x,s,6) e We with a random state s ^ y 
perfect known by the transmitter and a fixed but unknown 
channel 6* G 6 C C*. Here, W© = {W{-\x,s,e) : x G 
^ , s G y , 6* G 8} is a family of conditional transition PMs 
on parameterized by a vector G 8, which follows i.i.d. 
6 ^ tp{9). It is assumed that the receiver only knows an esti- 
mate 9 of the channel and a characterization of the estimator 
performance in terms of the conditional probability density 
function (pdf) ij{0\9) (this can be obtained using We and the 
a priori distribution of 9). On another side, the transmitter does 
not know the estimate 9, it only knows its statistic tjj{0). The 
extension of the DMC W{-\x, s,6) to n channel uses within 
a block is given by W^"(y|x, s, 6*) = ]Xl=iW{yi\xt, 8^,9) 
where x = {xi, . . . , a;„), s = (si, . . . , s„) and Si is an i.i.d. 
realization of Ps{s) and y = (yi, . . . , ?/„). It is assumed 
that the state sequence s is perfectly known at the transmitter 
before sending x and unknown at the receiver. 

Throughout this paper we consider a memoryless fading 
Costa channel. The discrete-time channel at time t is 

Yit) = H{t) {X{t) + S{t)) + Z{t), (2) 



where X{t) G C is the transmitter symbol and Y{t) G C is 
the received symbol. Here, H{t) G C is the complex random 
channel (6 = H) whose entries are independent identically 
distributed (i.i.d.) zero-mean circularly symmetric complex 
Gaussian (ZMCSCG) random variables (77^(0, a^). The noise 
Z{t) G C consists of i.i.d. ZMCSCG random variables with 
variance cr^- The channel state S{t) G C consists of i.i.d. 
ZMCSCG random variables with variance Q. The quantities 
H{t), Z{t), S{t) are assumed ergodic and stationary random 
processes, and the channel matrix H{t) is independent of S{t), 
X{t) and Z{t). This leads to a stationary and discreet-time 
memoryless channel W{y\x,s,H) with pdf 

W{y\x,s,H) = Cy4{H{x + s),crl). (3) 

The average symbol energy at the transmitter is constrained 
to satisfy Ex{X{t)X{ty} < P and {■)^ denotes Hermitian 
transposition. In practical situations, only a noisy estimate 
= H that differs from the true channel is available at 
the receiver. We next focus on training sequence design for 
channel estimation. 

III. Optimal design of channel training 

A standard technique to allow the receiver to estimate the 
channel matrix consists of transmitting training sequences, i.e., 
a set of symbols whose location and values are known to the 
receiver We assume that the channel is constant during the 
transmission of an entire codeword so that the transmitter, 
before sending the data x, sends a training sequence of N 
symbols xy = {xt.i^ ■ ■ ■ ^ xt.n)- The average energy per 
training symbol is Pt = ■^tr(xxx^). Thus, two different 
scenarios are relevant: 

(i) The channel affects the training sequence only, i.e. the 
decoder observes yt = Hxt + zt, where zt is the noise 
affecting the transmission of training symbols. This scenario 
arises, e.g., in Broadcast Channels where the transmitter does 
not send the sequence during the training phase. In that 
case, an optimal training is obtained by sending an arbitrary 
constant symbol, .xy ^ = .tq for all i = 1, . . . , iV. So that a 
maximum-likehood (ML) estimate 9 = Hml is obtained at the 
decoder from the observed output. The ML estimate of H is 
given [7] by 

Hml = (x^xt) "^x^yr = i? + £, (4) 
where £ = (x^xt) ^x^zt is the estimation error with 

al = SNRj;^ and SNRt = (5) 



(ii) The channel affects both the training sequence and the 
state sequence, which is unknown at the receiver, i.e. the 
decoder observes = H{xt + st) + zt, where is the 
state sequence affecting the channel as muhipUcative noise. 
This scenario arises in robust digital watermarking where the 
channel means an unknown multiplicative attack on the host 
signal St that is used for training. Here, because the presence 
of St with average energy per symbol Q 3> Pt, the scenario 
is much complicated than (i). In other words, as a consequence 
of this a different method for channel estimation is needed. 

We note that the transmitter, before sending the training 
sequence, perfectly knows the state sequence s^. Therefore, 
it can be used for adapting the training sequence to reduce 
the multiplicative noise at the transmitter. Consider the mean 
estimator Ha = (yr) = Hi> + (zt), where D = (xt) + 
(st) and (•) denotes the mean operator. Obviously, if for some 
length N the transmitter disposes of enough power Pt to get 
P = 1 the interference could completely be removed from yy. 
Of course, this is not possible for all sequences s^, and only 
part of these sequences can be removed. We can state this 
more formally as the following optimization problem. Given 
some arbitrary pair (A, 7) with < (A, 7) < 1, we find the 
optimal training sequence and its required length N* such 
that , 



(6) 



J Minimize \\xt\\ /N, 
\ Subjet to Prs^ (P^ < (1 - A)Pt) < 7, 

where (1 — A)Pt is the remainder power after removing 
St- This means that for 100 x (1 — 7)% of estimations the 
interference can be removed, elsewhere the training fails. We 
call 7 the failure tolerance level. Then, the solution of (|6]l is 
easily found to be x^(st) = {xq, . . . ,Xq) with 



v/(l - A)Pt - (st) if||x^(sT) 
elsewise, 



< NPt, 



(7) 

and N* is chosen such that Prs^ (|Ix5,(st)P > N*Pt) < 7. 
It follows that N* can be computed by using the cumulative 
function of a non-central chi-square of two degrees of freedom 



cdf(r;2,2N*PT{l - A)Q- 



1 — 7 with ?' 



2N' 



Actually, the channel estimate can be written as; 

Ha = H + £, 



Pt 



(8) 



where £ = ((1- A)Pt 



-1/2 



(Tj, =SNR7;\ and SNRta 



Zt) is the estimation error with 
iV(l - A)Pt 



(9) 



where (t| is the estimation 



Note that cr| = (1 - A)-i. 
error in (i). To compare both estimation scenarios, we define 
the noise reduction factor rj = (-/V(l — A)) ^. 
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Fig. 1. Noise reduction factor r] vs tlie training sequence lengths TV, for 
various probabilities 7. 

Fig.[T]shows the noise reduction factor 77 versus the training 
sequence length N, for various failure tolerance levels 7 G 
{10~^, 10~^, 10"'^}. The power of the state sequence Q is 
20 dB larger than that corresponding to the training sequence 
Pt- Let us suppose that, e.g., we want to get an estimation 
error 10 times less than the channel noise (i.e. rj = 10~^), with 
a failure tolerance level 7 = 10~^. From Fig.[T|we can observe 
that the required training length is = 500. Whereas in (i), 
where the state sequence is not present during the training, to 
get equal performances we would only require N = 10. 

Finally, we characterize both channel estimation perfor- 
mances in terms of the a posteriori pdf of H given Hml 
and the pdf of H given Ha- These pdfs will be needed in 
the next section to derive a composite channel model and its 
achievable rates. Using the fading pdf, the expression (|4]i and 
dHJ and some algebra, we obtain 



^^^JH\Hml) ^ CJi{SHML,Sal), 



CJ^{SHaJ(tI), 



(10) 



I ^^,|i?,(^I^A) = 

Where S = [al + tT|)- and 5^{al + aiy 

IV. Main results 

In this section, we first introduce the notion of reliable 
communication based on the average of the error probability 
over all channel estimation errors. This notion, for DMCs 
with state information non-causally known at the transmitter, 
allows us to consider the capacity of a composite (more noisy) 
channel. Then, we find the optimal DPC scheme and its 
achievable rates for the channel descripted in Section II with 
imperfect channel estimation (see Section III). 

A. Problem Definition and Coding Theorem 

A message m from the set M = {1, . . . , [2"''']} is trans- 
mitted using a length-n block code defined as a pair (ip, cf)) 



of mappings, where ip : M. x i~> is the encoder, 
and : X e 1-^ M U {0} is the decoder (that utilizes 6). 
Note that the encoder uses the realization of the state sequence 
s. This knowledge is exploited for encoding the information 
messages m G M. The rate, which depends on the channel 
estimate 9 through its decoder, is given by n^^ logj Mg. The 
maximum (over all messages) of the average of the error 
probability over all channel estimation errors 

emax(¥',^,^) = max Eg.g[ V W''{y\ip{m,s),s,9)]. 

mGJVl ' ^ — ' 

ye2y":0(yi)5^m 

For a given channel estimate 6, and < e < 1, a rate R> 
is e-achievable on an estimated channel, if for every S > and 
every sufficiently large n there exists a sequence of length-n 
block codes such that the rate satisfies log A/^ > R — 6 
and eniax(V: 01 ^) < This definition requires that maximum 
of the averaged error probability occurs with probability less 
than e. For a more robust notion of reliability over single-user 
channels we refer the reader to [9]. Then, a rate i? > is 
achievable if it is e-achievable for every < e < 1. Let C\{6) 
be the largest e-achievable rate for a given estimated 6. The 
mean capacity over all channel estimates is then defined as 
the mean of largest achievable rate, i.e., 

C = UmE,-[C,(^)]. 

We next state a theorem quantifying this capacity. 

Theorem 4.1: Given an estimate 9 known at the receiver 
and no channel information at the transmitter. The capacity 
of a channel VF(-|a;, s, 9) with channel state information non- 
causally known at the transmitter is given by 

C= max Es\'^(P(u,x\s),9)], (11) 

where 

^{Piu,x\s),9) = l{Pu;Wi-\u,9)) - l{Ps;Puis)- (12) 

In this theorem x ^) denotes the set of PMs on 

('^ X JT) so that J7 (X, S*) y form a Markov Chain. 
We emphasize that the supremum in ( fTTT i is taken over all 
input distributions not depending on the channel estimates 9. 
The composite channel 

Wiy\u,9)^ Pix\u,s)Psis)W{y\x,sJ), (13) 

and W{y\x, s, 9) = Eg|g |x, s, 9)\ , where Eg|g [•] denotes 
the expectation with the conditional pdf ijjg^g characterizing the 
channel estimation. We also used the mutual information 

W{v\u,9) 



with Q{y\9) ~ X^mgu The capacity can be 
attained by using the maximum-Ukelihood (ML) decoding 
metric based on the composite channel model (fTsT l (cf. [10]). 
The proof of this coding theorem is straightforward from [1] 
and basic information properties. 

B. Achievable rates and optimal DPC scheme 

We derive achievable rates for the channel (|3]l by as- 
suming Gaussian inputs and both estimation scenarios ^ 
and ([8]l. To evaluate ( fTTT i in Q requires solving an opti- 
mization problem where we have to determine the optimum 
distribution P{u,x\s) maximizing the capacity. We begin by 
computing the composite channel model for both estimation 
scenarios, i.e. W {y\x , s , Hml) ~ ^HlHuhl-^^y^^' -^^^ 
W{y\x, s, Ha) = ^h\h^ [W{y\x, s, H)] . From ^ it is not 
difficult to show that 



W{y\x,s,HML) =CTi{SHML{x + s),al+S4 



_ (14) 

W{y\x,s,HA) = C']^{SHAix + s),al + Sajilx]^ + 

(15) 

Actually, we only need to consider the capacity associated 
to (fT4l i corresponding to the scenario (i), since the pdf (fTsT i 
differences in constant quantities. 

Channel estimates known at the transmitter: Obviously, if 
the channel estimates Hml are known at the transmitter, the 
optimal input distribution is shown to be given by 



P{x) if X + a*{HuL)s, 



(16) 



elsewhere, 
where P{x) = CK(0, P), and P is the power constraint and 



a*{H, 



5^\HuL?P 



ml; 



5^\Hml?P + <jI+5cjI{P + Q) 
The capacity denoted CtxRx is then 



(17) 



CtxB^x = Eg_^^|log2 I 1 



5'\H 



ML I 



}■ (18) 



4 + 5a|(P + Q) 

This easily follows from the fact that in this case it is possible 
to swap expectation and maximization in ( fTTT i. 

Channel estimates unknown at the transmitter: Here we 
cannot use the optimal DPC scheme ( fTSt , because the channel 
estimates Hml are not available at the transmitter to compute 
the parameter ( fTTI l. However, assuming Gaussian inputs, which 
means that P{u, x\s^ is a conditional joint Gaussian pdf. The 
optimal DPC scheme can be shown to be given by 



Q{y\e) 



P{u, x\s) = 



P{x) if u = X + as, 
elsewhere, 



(19) 



where a £ [0, 1] is the parameter maximizing ( fTTT i. Hence, 
given a the achievable rates can be computed by replacing 
(HUl and ( fT9] l in (T% . Thus, using some algebra we obtain 



i4Pu;W{-\uJ)) ^ log 



(P + Q + N)(P + a2Q 
'Q(l-a)2 +N(P + a2 



where 



u\s) 



l0g2 



(20) 
(21) 



^'li^MLpQ and N = aj 



Sal{P + Q). Given < a < 1, by using ^ and (E), the 
capacity ^^^^^(a) writes 



N) 



1 - a)2 + 



(22) 



We remark that our Gaussian assumption only leads to a lower 
bound (|22] | of the capacity ( fT3T l. However, in the next section 
we shall observe that this bound is tight for realistic SNR 
values. Actually, it remains to find the optimal parameter 
a maximizing (|22] |. Let us first consider the more intuitive 
suboptimal choice given by the mean of the optimal a*(iJML) 
in ([TtT i. i.e. a = E^^^{a*(_ffML)}- To compute this mean, we 
note that //ml has a Gaussian pdf C?\r(0,(T^ + (t|). Hence, 
we can show that 



1 - pexp(/9)£'i(/9), with p = 



N 



5'PK + air 



(23) 

oo 

where Ei{z) = J t^^ cxp{—t)dt denotes the exponential 

z _ 

integral function. Therefore, all rates smaller than Crx{c() are 
achievable by using DPC scheme (fT9l ) and the mean a (|23] l. 

Another possibility is to find directly the optimal parameter 
a* maximizing (|22] |. To this end, we observe that 



(24) 



a* = arg^min^E^_^^|log2 (PQ(1 - a)^ + N(P + . 

Using some algebra, from (|24] |. we can obtain 

a* = arg min^ | \og2{P/Q + 

1 + f piP/Q + a') \^ 



(25) 



Unfortunately, there is not exists an explicit solution for a* in 
(l25T l. However, this maximization can be numerically solved 
to then compute Cfixia*)- 

All derived results through this section are also valid for the 
channel model (flST l. corresponding to the estimation scenario 
(ii). We replace 6 with S and <t| with aj, in all expressions. 
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Fig. 2. Optimal parameter a* (solid lines) vs the SNR, for various training 
sequence lengths N. Dashed lines show mean alpha a. 

V. Simulation results and discussions 

In this subsection, numerical results are presented based 
on Monte Carlo simulations. Fig. |2] shows both the mean 
parameter a ( |23] | and the optimal parameter a* ( |25] ) versus 
the signal-to-noise ratio, for various training sequence lengths 
N. The state sequence power Q is +20 dB larger than that of 
the channel input P, and the training power is Pt = P- We 
can observe that both parameters are relatively close for many 
SNR values. Furthermore, even in the SNR ranges where the 
values seem to be quite different, we have observed that the 
achievable rates with a are very close to those provided by the 
optimal solution a* . Therefore, we can conclude that the mean 
parameter can be used to design the optimal DPC scheme. 

Fig. [3] shows achievable rates ( |22] | (in bits per channel use) 
with channel estimates unknown at the transmitter versus the 
SNR, for various training sequence lengths N € {1,10,20} 
(dashed line). For comparison we also show achievable rates 
( fTSb with channel estimates known at the transmitter (danshed- 
dot line) and with perfect channel knowledge at both transmit- 
ter and receiver (solid line). It is seen that the average rates 
tend to increase rather fast with the amount of training. For 
example, to achieve 2 bits with channel estimates unknown at 
the transmitter Observe that a scheme with estimated channel 
and iV = 10 requires 18 dB, i.e., lldB more than with 
perfect channel information. Whereas, if the training length 
is further reduced to = 1, this gap increases to 27 dB. On 
the other hand, when the channel estimates are known at the 
transmitter, the SNR requeried for 2 bits is only 1 dB less 
than the case with channel estimates unknown. This rate gain 
is slightly smaller, and consequently we can conclude that the 
knowledge of the channel estimates at the transmitter is not 



really necessary with the proposed DPC scheme. 

Finally, we study the impact of the power state sequence on 
the achievable rates. Fig. |4] shows similar plots for different 
values of +Q e {+20, +30, +40}, i.e., Q is times larger (in 
dB) than the channel input power P, and training sequence 
length is iV = 10. We can observe that the performance are 
very sensitive to the power Q. This is because with imperfect 
channel estimation the capacity still depends on Q (cf. (|22]) ). 
while with perfect channel information the state sequence is 
cancelled at the transmitter independent of the power Q. 

14n 
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Fig. 3. Achievable rates witli cliannel estimates known at tlie transmitter 
(daslied-dot lines) vs the SNR, for various training sequence lengths A'^. 
Dashed lines suppose channel estimates unknown at the transmitter Solid 
line shows the capacity with the channel known at both transmitter/receiver 
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Fig. 4. Similar plots for different power values of the state sequence Q. 

VI. Conclusion 

In this paper we studied the problem of communicating 
reliably over unknown channels with channel states non- 
causally known at the transmitter We assumed that no channel 
information is available at the transmitter and imperfect chan- 
nel information is available at the receiver, i.e., the receiver 



only has access to a noisy estimate of the channel. In this 
scenario, we proposed to characterize the information-theoretic 
limits through the notion of reliable communication based on 
the average of the error probability over all channel estimation 
errors. We presented an explicit expression, for general DMCs, 
of its maximal achievable rate averaged over all channel 
estimates. Then, we computed mean achievable rates for the 
fading Costa's channel with ML channel estimation and Gaus- 
sian inputs. We also studied optimal training design adapted 
to each application, e.g. Broadcast Channels or watermarking. 

The somewhat unexpected result is that, while it is well- 
known that DPC requires perfect channel knowledge at both 
transmitter and receiver, without channel information at the 
transmitter, significant gains can be still achieved by the DPC 
strategy, using the proposed DPC scheme. Further numerical 
results show that, under the assumption of imperfect channel 
information at the receiver, the benefit of channel estimates 
known at the transmitter does not lead to large rate increases. 

Codes achieving capacity do not need to be long to exploit 
the long-term ergodic properties though the estimated fading 
process, and can be applied when the real transmission time 
is not large compared to the coherence time of the channel. 
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